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RATIONALLY INATTENTIVE CONTROL 
OF MARKOV PROCESSES* 

EHSAN SHAFIEEPOORFARDt, MAXIM RAGINSKYt, AND SEAN P. MEYN* 


Abstract. The article poses a general model for optimal control subject to information con¬ 
straints, motivated in part by recent work of Sims and others on information-constrained decision¬ 
making by economic agents. In the average-cost optimal control framework, the general model intro¬ 
duced in this paper reduces to a variant of the linear-programming representation of the average-cost 
optimal control problem, subject to an additional mutual information constraint on the randomized 
stationary policy. The resulting optimization problem is convex and admits a decomposition based 
on the Bellman error, which is the object of study in approximate dynamic programming. The the¬ 
ory is illustrated through the example of information-constrained linear-quadratic-Gaussian (LQG) 
control problem. Some results on the infinite-horizon discounted-cost criterion are also presented. 
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1. Introduction. The problem of optimization with imperfect information [5] 
deals with situations where a decision maker (DM) does not have direct access to 
the exact value of a payoff-relevant variable. Instead, the DM receives a noisy signal 
pertaining to this variable and makes decisions conditionally on that signal. 

It is usually assumed that the observation channel that delivers the signal is fixed 
a priori. In this paper, we do away with this assumption and investigate a class of 
dynamic optimization problems, in which the DM is free to choose the observation 
channel from a certain convex set. This formulation is inspired by the framework 
of Rational Inattention, proposed by the well-known economist Christopher Sims^ 
to model decision-making by agents who minimize expected cost given available in¬ 
formation (hence “rational”), but are capable of handling only a limited amount of 
information (hence “inattention”) [28,29]. Quantitatively, this limitation is stated as 
an upper bound on the mutual information in the sense of Shannon [25] between the 
state of the system and the signal available to the DM. 

Our goal in this paper is to initiate the development of a general theory of optimal 
control subject to mutual information constraints. We focus on the average-cost 
optimal control problem for Markov processes and show that the construction of 
an optimal information-constrained control law reduces to a variant of the linear- 
programming representation of the average-cost optimal control problem, subject to 
an additional mutual information constraint on the randomized stationary policy. The 
resulting optimization problem is convex and admits a decomposition in terms of the 
Bellman error, which is the object of study in approximate dynamic programming 
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[5,22]. This decomposition reveals a fundamental connection between information- 
constrained controller design and rate-distortion theory [4], a branch of information 
theory that deals with optimal compression of data subject to information constraints. 

Let us give a brief informal sketch of the problem formulation; precise definitions 
and regularity/measurability assumptions are spelled out in the sequel. Let X, U, and 
Z denote the state, the control (or action), and the observation spaces. The objective 
of the DM is to control a discrete-time state process with values in X by 

means of a randomized control law (or policy) $(dM(|zt), t > 1, which generates a 
random action [/( S U conditionally on the observation Zt G Z. The observation Zt, 
in turn, depends stochastically on the current state Xt according to an observation 
model (or information structure) W(dzt\xt). Given the current action Ut = Ut and 
the current state Xt = Xt, the next state Xt+i is determined by the state transition 
law Q(dxt+i\xt,Ut)■ Given a one-step state-action cost function c : X x U —)■ K’*' and 
the initial state distribution fi = Law(Xi), the pathwise long-term average cost of any 
pair ($, W) consisting of a policy and an observation model is given by 

1 ^ 

=limsup-Vc(Xt,C/(), 

where the law of the process {{Xt, Zt, Ut)} is induced by the pair ($, W) and by the 
law p, of Xi; for notational convenience, we will suppress the dependence on the fixed 
state transition dynamics Q. 

If the information structure W is fixed, then we have a Partially Observable 
Markov Decision Process, where the objective of the DM is to pick a policy 4>* to 
minimize ^^(d>, W). In the framework of rational inattention, however, the DM is also 
allowed to optimize the choice of the information structure W subject to a mutual 
information constraint. Thus, the DM faces the following optimization problem:^ 

minimize Jyj(<i>, IP) (I.la) 

subject to limsup/(Xt; Zt) < R (1-lb) 

>-oo 

where I{Xt',Zt) denotes the Shannon mutual information between the state and the 
observation at time t, and i? > 0 is a given constraint value. The mutual information 
quantifies the amount of statistical dependence between Xt and Zp, in particular, it is 
equal to zero if and only if Xt and Zt are independent, so the limit i? — >■ 0 corresponds 
to open-loop policies. If i? < oo, then the act of generating the observation Zt will in 
general involve loss of information about the state Xt (the case of perfect information 
corresponds to taking R —)■ oo). However, for a given value of R, the DM is allowed to 
optimize the observation model W and the control law $ jointly to make the best use 
of all available information. In light of this, it is also reasonable to grant the DM the 
freedom to optimize the choice of the observation space Z, i.e., to choose the optimal 
representation for the data supplied to the controller. In fact, it is precisely this 
additional freedom that enables the reduction of the rationally inattentive optimal 
control problem to an infinite-dimensional convex program. 

This paper addresses the following problems: (a) give existence results for optimal 
information-constrained control policies; (b) describe the structure of such policies; 
and (c) derive an information-constrained analogue of the Average-Gost Optimality 

^Since W) is a random variable that depends on the entire path {{Xt, Ut)}, the definition 

of a minimizing pair ($, W) requires some care. The details are spelled out in Section 3. 
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Equation (ACOE). Items (a) and (b) are covered by Theorem 5.5, whereas Item (c) is 
covered by Theorem 5.6 and subsequent discussion in Section 5.3. We will illustrate 
the general theory through the specific example of an information-constrained Linear 
Quadratic Gaussian (LQG) control problem. Finally, we will outline an extension of 
our approach to the more difficult infinite-horizon discounted-cost case. 

1.1. Relevant literature. In the economics literature, the rational inattention 
model has been used to explain certain memory effects in different economic equilibria 
[30], to model various situations such as portfolio selection [16] or Bayesian learning 
[24], and to address some puzzles in macroeconomics and finance [19,35,36]. However, 
most of these results rely on heuristic considerations or on simplifying assumptions 
pertaining to the structure of observation channels. 

On the other hand, dynamic optimization problems where the DM observes the 
system state through an information-limited channel have been long studied by control 
theorists (a very partial list of references is [1,3,6,33,34,37,42]). Most of this literature 
focuses on the case when the channel is fixed, and the controller must be supplemented 
by a suitable encoder/decoder pair respecting the information constraint and any con¬ 
siderations of causality and delay. Notable exceptions include classic results of Bansal 
and Ba§ar [1,3] and recent work of Yiiksel and Linder [42]. The former is concerned 
with a linear-quadratic-Gaussian (LQG) control problem, where the DM must jointly 
optimize a linear observation channel and a control law to minimize expected state- 
action cost, while satisfying an average power constraint; information-theoretic ideas 
are used to simplify the problem by introducing a certain sufficient statistic. The 
latter considers a general problem of selecting optimal observation channels in static 
and dynamic stochastic control problems, but focuses mainly on abstract structural 
results pertaining to existence of optimal channels and to continuity of the optimal 
cost in various topologies on the space of observation channels. 

The paper is organized as follows: The next section introduces the notation and 
the necessary information-theoretic preliminaries. Problem formulation is given in 
Section 3, followed by a brief exposition of rate-distortion theory in Section 4. In 
Section 5, we present our analysis of the problem via a synthesis of rate-distortion 
theory and the convex-analytic approach to Markov decision processes (see, e.g., [8]). 
We apply the theory to an information-constrained variant of the LQG control prob¬ 
lem in Section 6. All of these results pertain to the average-cost criterion; the more 
difficult infinite-horizon discounted-cost criterion is considered in Section 7. Gertain 
technical and auxiliary results are relegated to Appendices. 

Preliminary versions of some of the results were reported in [27] and [26]. 

2. Preliminaries and notation. All spaces are assumed to be standard Borel 
(i.e., isomorphic to a Borel subset of a complete separable metric space); any such 
space will be equipped with its Borel a-field B(-). We will repeatedly use standard 
notions results from probability theory, as briefly listed below; we refer the reader 
to the text by Kallenberg [17] for details. The space of all probability measures on 
(X, B(X)) will be denoted by V(X); the sets of all measurable functions and all bounded 
continuous functions X —>• K will be denoted by M(X) and by C'b(X), respectively. 
We use the standard linear-functional notation for expectations: given an X-valued 
random object X with Law(X) = fJ, € V{X) and / € C M(X), 

{bJ)= f fix)f^{dx) =E[fiX)]. 

Jx 
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A Markov (or stochastic) kernel with input space X and output space Y is a mapping 
K{-\-) : B{Y) X X —>• [0,1], such that K{-\x) S 7^(Y) for all a: S X and x i-A K{B\x) S 
M(X) for every B G B{Y). We denote the space of all such kernels by A4(Y|X). Any 
K G A^(Y|X) acts on f G A^(Y) from the left and on fj, G V{X) from the right: 

Kf{-) ^ [ f{y)K{dy\-), ixKi-) 4 [ K{■\x)^i{dx). 

Jy J\ 

Note that Kf G M.(X) for any / S A^(Y), and fj,K G V{Y) for any y, G V(X). Given 
a probability measure y G 'P(X), and a Markov kernel K G A^(Y|X), we denote by 
y® K a, probability measure defined on the product space (X x Y, B{X) ® B(Y)) via 
its action on the rectangles A x B, A G B{X), B G B{Y): 

{y®K){AxB)= f K{B\x)y{dx). 

J A 

If we let A = X in the above definition, then we end up with with yK{B). Note that 
product measures y®v, where v G P(Y), arise as a special case of this construction, 
since any ly G ’P(Y) can be realized as a Markov kernel {B,x) i—>■ v{B). 

We also need some notions from information theory. The relative entropy (or 
information divergence) [25] between any two probability measures y,iy G ‘P(X) is 

[+ 00 , otherwise 

where ^ denotes absolute continuity of measures, and dy/du is the Radon-Nikodym 
derivative. It is always nonnegative, and is equal to zero if and only A y = v. The 
Shannon mutual information [25] in (y,K) G ViX) x A^(YjX) is 

I{y,K)^ D{y®K\\y®yK), (2.1) 

The functional I{y, K) is concave in y, convex in K, and weakly lower semicontinuous 
in the joint law y ® K: for any two sequences {yn}^^i C 'P(X) and C 

AI(YjX) such that /i„ ® Kn - y® K weakly, we have 

liminf/(/r„, Ar„) >/(/r, AT) (2.2) 

n—>oo 

(indeed, if ® Kn converges to y ® K weakly, then, by considering test functions 
in Cb{X) and Ct,(Y), we see that yn ^ y and ynKn -G yK weakly as well; Eq. (2.2) 
then follows from the fact that the relative entropy is weakly lower-semicontinuous in 
both of its arguments [25]). If {X,Y) is a pair of random objects with Law(A, F) = 
r = y® K, then we will also write I{X;Y) or /(T) for I{y,K). In this paper, 
we use natural logarithms, so mutual information is measured in nats. The mutual 
information admits the following variational representation [32]: 

I{y,K)= inf D{y® KWy®^), (2.3) 

v&ViY) 

where the infimum is achieved hy v = yK. It also satisfies an important relation 
known as the data processing inequality: Let (A, F, Z) be a triple of jointly distributed 
random objects, such that X and Z are conditionally independent given F. Then 

/(A;F)</(A;F). (2.4) 

In words, no additional processing can increase information. 
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Fig. 3.1. System model. 


3. Problem formulation and simplification. We now give a more precise for¬ 
mulation for the problem (1.1) and take several simplifying steps towards its solution. 
We consider a model with a block diagram shown in Figure 3.1, where the DM is con¬ 
strained to observe the state of the controlled system through an information-limited 
channel. The model is fully specified by the following ingredients: 

(M.l) the state, observation and control spaces denoted by X, Z and U respectively; 
(M.2) the (time-invariant) controlled system, specified by a stochastic kernel Q G 
A1(X|X X U) that describes the dynamics of the system state, initially dis¬ 
tributed according to /r G 'P(X); 

(M.3) the observation channel, specified by a stochastic kernel W G At(Z|X); 

(M.4) the feedback controller, specified by a stochastic kernel $ G At(U|Z). 

The X-valued state process {Xt}, the Z-valued observation process {Zt}, and the 
U-valued control {Ut} process are realized on the canonical path space (D, Pjf’'^), 

where D = X^ x Z^ x U^, X is the Borel cr-field of fl, and for every t > 1 

Xt{uj)=x{t), Zt{uj) = z{t), Ut{uj)=u{t) 

with oj = ix,z,u) = ((a;(l),a:(2),...,),(z(l),z(2),...),(u(l),u(2),...)). The process 
distribution satisfies G ■) = fi, and 

G-iX*, Z^-\ C/‘-i) = Wi-\Xt) 
F';^’’^iUtG-\x\z\u^-^) = <i>{-\Zt) 
p;r’^(x,+i G -IX^ZW*) = Qi-\Xt,Ut). 

Here and elsewhere, X^ denotes the tuple (Xi, .. . ,Xt); the same applies to Z*, 17*, 
etc. This specification ensures that, for each t, the next state X^+i is conditionally 
independent of Z*, 17*“^ given Xt,Ut (which is the usual case of a controlled 

Markov process), that the control Ut is conditionally independent of X^ , Z^~^ 
given Zt, and that the observation Zt is conditionally independent of Z^~^, 

given the most recent state Xt- In other words, at each time t the controller takes 
as input only the most recent observation Zt, which amounts to the assumption that 
there is a separation structure between the observation channel and the controller. 
This assumption is common in the literature [33,34,37]. We also assume that the 
observation Zt depends only on the current state Xt; this assumption appears to be 
rather restrictive, but, as we show in Appendix A, it entails no loss of generality under 
the above separation structure assumption. 

We now return to the information-constrained control problem stated in Eq. (1.1). 
If we fix the observation space Z, then the problem of finding an optimal pair (W, $) 
is difficult even in the single-stage (T = 1) case. Indeed, if we fix W, then the 
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Bayes-optimal choice of the control law $ is to minimize the expected posterior cost: 

= du*{z){du), where u*{z) = argminE[c(X, m)|Z = z]. 

uGU 

Thus, the problem of finding the optimal W* reduces to minimizing the functional 


W 1 -^ inf [ iM{Ax)W{Az\x)^{du\z)c{x,u) 

$GA1(U|Z) JxxUxZ 

over the convex set {W G A4[Z\X) : I{ii, W) < R}. However, this functional is con¬ 
cave, since it is given by a pointwise infimum of affine functionals. Hence, the problem 
of jointly optimizing (W, <i>) for a fixed observation space Z is nonconvex even in the 
simplest single-stage setting. This lack of convexity is common in control problems 
with “nonclassical” information structures [18]. 

Now, from the viewpoint of rational inattention, the objective of the DM is to 
make the best possible use of all available information subject only to the mutual 
information constraint. From this perspective, fixing the observation space Z could 
be interpreted as suboptimal. Indeed, we now show that if we allow the DM an 
additional freedom to choose Z, and not just the information structure W, then we 
may simplify the problem by collapsing the three decisions of choosing Z, IF, into 
one of choosing a Markov randomized stationary (MRS) control law d) G A1(U|X) 
satisfying the information constraint limsup(_,,^ /(^t, $) < R, where G •) 

is the distribution of the state at time t, and P* denotes the process distribution of 
under which G •) = G = ^'(•|X(), and 

P^(Xt_|_i G ■\X^,U*) = Q{-\Xt,Ut). Indeed, fix an arbitrary triple (Z, IF, $), such 
that the information constraint (1.1b) is satisfied w.r.t. 

limsup/(Xt; Z() < i?. (3.1) 

>-oo 

Now consider a new triple (Z', IF', d)') with Z' = U, IF' = $ o IF, and <i)'(du|z) = 
^^(dM), where is the Dirac measure centered at z. Then obviously P((Xt, Ut) G •) 
is the same in both cases, so that J^($',IF') = J^(<1>,IF). On the other hand, from 
(3.1) and from the data processing inequality (2.4) we get 

limsup/(/rt, IF') = limsup/(^t, $ o IF) < limsup/(/rt, IF) < R, 

t—¥00 t—>00 t—XX) 


so the information constraint is still satisfied. Conceptually, this reduction describes 
a DM who receives perfect information about the state Xt, but must discard some of 
this information “along the way” to satisfy the information constraint. 

In light of the foregoing observations, from now on we let Zt = Xt and focus on 
the following information-constrained optimal control problem: 


minimize 


J^($) = limsup ;^Vc(Xt,t7t) 

T—>^00 ,_T 


subject to limsup/(/it, $) < R. 


(3.2a) 

(3.2b) 


Here, the limit supremum in (3.2a) is a random variable that depends on the entire 
path {{Xt, 17i)}^i, and the precise meaning of the minimization problem in (3.2a) is 
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as follows: We say that an MRS control law satisfying the information constraint 
(3.2b) is optimal for (3.2a) if 

J^($*) = inf J^(ci>), Pf-a.s. (3.3) 


where 


J^($) = limsup;^E^ 
T —>-00 ^ 


r T 




(3.4) 


is the long-term expected average cost of MRS $ with initial state distribution fx, and 
where the infimum on the right-hand side of Eq. (3.3) is over all MRS control laws $ 
satisfying the information constraint (3.2b) (see, e.g., [14, p. 116] for the definition of 
pathwise average-cost optimality in the information-unconstrained setting). However, 
we will see that, under general conditions, J^($*) is deterministic and independent 
of the initial condition. 

4. One-stage problem: solution via rate-distortion theory. Before we 
analyze the average-cost problem (3.2), we show that the one-stage case can be solved 
completely using rate-distortion theory [4] (a branch of information theory that deals 
with optimal compression of data subject to information constraints). Then, in the 
following section, we will tackle (3.2) by reducing it to a suitable one-stage problem. 

With this in mind, we consider the following problem: 


minimize (/r ® $, c) (4.1a) 

subject to $ e (4-lb) 

for a given probability measure ^ S ’P(X) and a given i? > 0, where 

X^(i?)^{$eA4(U|X):/(M,$)<i?}. (4.2) 

The set is nonempty for every i? > 0. To see this, note that any kernel <l>o G 

Af(U|X) for which the function x >-)■ is constant (^-a.e. for any B G S(U)) 

satisfies $*) = 0. Moreover, this set is convex since the functional $ i—)■ /(/r, $) is 
convex for any fixed yL. Thus, the optimization problem (4.1) is convex, and its value 
is called the Shannon distortion-rate function (DRF) of /r: 

D^{R]c)= inf (/r(g)$,c). (4.3) 

In order to study the existence and the structure of a control law that achieves 
the infimum in (4.3), it is convenient to introduce the Lagrangian relaxation 

£^(4), V, s) = sD{yL (g) $||/r (g):/) -f (/r (g) $, c), s > 0, G 'P(U). 

From the variational formula (2.3) and the definition (4.3) of the DRF it follows that 

inf inf £„($,:/, s) < si?-f Zl„(i?; c). 

Then we have the following key result [10]: 

Proposition 4.1. The DRF D^{R]c) is convex and nonincreasing in R. More¬ 
over, assume the following: 
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(D.l) The cost function c is lower semicontinuous, satisfies 

inf c{x,u) > —oo, yx GX 
•uGU 

and is also coercive; there exist two sequences of compact sets X„ f X and 
U„ t U such that 


lim inf c(x,u) = +oo. 

n-^-oo ,'U6U“ 


(D.2) There exists some mq S U such that (/i,c(',uo)) < oo. 
Define the critical rate 


i?o — inf 



(it may take the value +oo^. Then, for any R < Rq there exists a Markov kernel 
G A^(U|X) satisfying /(/r, $*) = R and {iJ, ^*,c} = D^(R;c). Moreover, the 
Radon-Nikodym derivative of the joint law /r 0 $* w.r.t. the product of its marginals 
satisfies 


d(/r(g) $*) 


d(^ (g) 

where a : X —^ IR+ and s > 0 are .such that 


{x, u) = a{x)e 


— ^c{x 


f a(x)e-V(dx) <1, Vu G U 

Jx 


and —s is the slope of a line tangent to the graph of D^{R] c) at R: 

D^{R'\ c) + sR' > Df^{R-, c) + sR, yR! > 0. 


(4.4) 


(4.5) 


(4.6) 


For any R > Rq, there exists a Markov kernel $* G A^(U|X) satisfying 


(/i(g) 4)*,c) = (fj,, inf c(-,m)) 

\ liGU / 

and /(/i, $*) = Rq. This Markov kernel is deterministic, and is implemented by 
<i)*(du|a;) = (5„.(a;)(du), where u*{x) is any minimizer of c(x,u) over u. 

Upon substituting (4.4) back into (4.3) and using (4.5) and (4.6), we get the 
following variational representation of the DRF: 

Proposition 4.2. Under the conditions of Prop, j.l, the DRF D^{R\c) can he 
expressed as 


DuiR', c) = svlT) inf s 



/u e »“i'’“V(dM)^ 



5. Convex analytic approach for average-cost optimal control with ra¬ 
tional inattention. We now turn to the analysis of the average-cost control problem 
(3.2a) with the information constraint (3.2b). In multi-stage control problems, such 
as this one, the control law has a dual effect [2]: it affects both the cost at the cur¬ 
rent stage and the uncertainty about the state at future stages. The presence of the 
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mutual information constraint (3.2b) enhances this dual effect, since it prevents the 
DM from ever learning “too much” about the state. This, in turn, limits the DM’s 
future ability to keep the average cost low. These considerations suggest that, in 
order to bring rate-distortion theory to bear on the problem (3.2a), we cannot use the 
one-stage cost c as the distortion function. Instead, we must modify it to account for 
the effect of the control action on future costs. As we will see, this modification leads 
to a certain stochastic generalization of the Bellman Equation. 

5.1. Reduction to single-stage optimization. We begin by reducing the dy¬ 
namic optimization problem (3.2) to a particular static (single-stage) problem. Once 
this has been carried out, we will be able to take advantage of the results of Section 4. 
The reduction is based on the so-called convex-analytic approach to controlled Markov 
processes [8] (see also [7,13,20,22]), which we briefly summarize here. 

Suppose that we have a Markov control problem with initial state distribution 
p, € ’P(X) and controlled transition kernel Q € AI(X|X x U). Any MRS control law $ 
induces a transition kernel (5$ € AI(X|X) on the state space X; 

QMx)= f Q{A\x,u)^{du\x), VAee(X). 

Jv 

We wish to find an MRS control law S AI(U|X) that would minimize the long-term 
average cost T^($) simultaneously for all With that in mind, let 

J* = inf inf Ai(4>), 
mg'P(x) $eAi(U|x) 

where J^($) is the long-term expected average cost defined in Eq. (3.4). Under certain 
regularity conditions, we can guarantee the existence of an MRS control law $*, such 
that J^($*) = J* P* -a.s. for all fjL G 'P(X). Moreover, this optimizing control law is 
stable in the following sense: 

Definition 5.1. An MRS control law 4> G AI(U|X) is called stable if: 

• There exists at least one probability measure tt G ’P(X), which is invariant 
w.r.t. (5$; TT = 7rQ$. 

• The average cost J,r(‘h) is finite, and moreover 

J7r('l>) = (r$, c) = / c(a;, u)r$(da;, du), where T^ = tt (Si ^ . 

JxxU 

The subset o/AI(U|X) consisting of all such stable control laws will be denoted by 1C. 
Then we have the following [14, Thm. 5.7.9]: 

Theorem 5.2. Suppose that the following assumptions are satisfied: 

(A.l) The cost function c is nonnegative, lower semicontinuous, and coercive. 

(A.2) The cost function c is inf-compact, i.e., for every x gX and every r G M, the 
set {u G U : c{x, u) < r} is compact. 

(A.S) The kernel Q is weakly continuous, i.e., Qf G Cb{X x U) for any f G Cb{X). 
(A.4) There exist an MRS control law 4> and an initial state x G X, such that 
< oo. 

Then there exists a control law 4>* G A, such that 

J* = 4.(4>*)= inf (r$,c), (5.1) 

■jg/c 

where tt* = Tr*Q^t. Moreover, if 4>* is such that the induced kernel Q* = is 
Harris-recurrent, then J^($*) = J* P^ -a.s. for all p, G ViX). 
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One important consequence of the above theorem is that, if <&* G /C achieves the 
infimum on the rightmost side of (5.1) and if tt* is the unique invariant distribution 
of the Harris-recurrent Markov kernel , then the state distributions fj,t induced 
by converge weakly to tt* regardless of the initial condition fii = fj,. Moreover, the 
theorem allows us to focus on the static optimization problem given by the right-hand 
side of Eq. (5.1). 

Our next step is to introduce a steady-state form of the information constraint 
(3.2b) and then to use ideas from rate-distortion theory to attack the resulting opti¬ 
mization problem. The main obstacle to direct application of the results from Sec¬ 
tion 4 is that the state distribution and the control policy in (5.1) are coupled through 
the invariance condition 7r$ = 7r$(3$. However, as we show next, it is possible to de¬ 
couple the information and the invariance constraints by introducing a function-valued 
Lagrange multiplier to take care of the latter. 

5.2. Bellman error minimization via marginal decomposition. We be¬ 
gin by decomposing the infimum over $ in (5.1) by first fixing the marginal state 
distribution tt G 7^(X). To that end, for a given tt G 7^(X), we consider the set 
of all stable control laws that leave it invariant (this set might very well be empty): 
ICtt = {$ G /C : tt = In addition, for a given value i? > 0 of the information con¬ 

straint, we consider the set T.„{R) = {4) G AI(U|X) : /(tt, $) < R} (recall Eq. (4.2)). 

Assuming that the conditions of Theorem 5.2 are satisfied, we can rewrite the 
expected ergodic cost (5.1) (in the absence of information constraints) as 

J* = inf(r$,c)= inf inf (7r(g)$,c). (5.2) 

$G/C 7rGR(X) 

In the same spirit, we can now introduce the following steady-state form of the 
information-constrained control problem (3.2): 

J*(R)= inf inf (7r(g)$,c), (5.3) 

7rG'P(X)$G/C„(fi) 

where the feasible set ICttIR) = JCTr<^R- 7 T{R) accounts for both the invariance constraint 
and the information constraint. 

As a first step to understanding solutions to (5.3), we consider each candidate 
invariant distribution tt G ’P(X) separately and define 

J*{R)= inf (7r(g)$,c) (5.4) 

<S>GK^(R) 

(we set the infimum to -boo if = 0). Now we follow the usual route in the theory of 
average-cost optimal control [22, Ch. 9] and eliminate the invariance condition 4> G /Ctt 
by introducing a function-valued Lagrange multiplier: 

Proposition 5.3. For any n £V{X), 

J*{R)= inf sup (tt (g) $, c-b Qh — (g) 1). (5.5) 

h&CbiX) 

Remark 1. Both in (5.5) and elsewhere, we can extend the supremum over 
h G Ct,(X) to all h G L^(7r) without affecting the value of J^{R) (see, e.g., the 
discussion of abstract minimax duality in [38, App. 1.3]). 

Remark 2. Upon setting = J*{R), we can recognize the function c -b Qh — 
/i g 1 — as the Bellman error associated with h; this object plays a central role in 
approximate dynamic programming. 
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Proof. Let take the value 0 if $ C /Ctt and +oo otherwise. Then 

Jf{R)= inf [(7r(g)$,c)+i^($)]. (5.6) 

3>GI,(A) 

Moreover, 

i 7 r(^’) = sup [(7rQ$,h) - (tt, h)] (5.7) 

hdcpx) 

Indeed, if $ G then the right-hand side of (5.7) is zero. On the other hand, 
suppose that $ ^ Since X is standard Borel, any two probability measures 

/r, i/ G ’P(X) are equal if and only if (/r, h) = {i',h) for all h G C'f,(X). Consequently, 

(tt, ho) 7^ (ttQ*, ho) for some ho G Cb{X). There is no loss of generality if we assume 

that (7r(5$,/io) — (tt, ho) > 0. Then by considering functions hg = nho for all n = 
1,2,... and taking the limit as n —> oo, we can make the right-hand side of (5.7) grow 
without bound. This proves (5.7). Substituting it into (5.6), we get (5.5). □ 

Armed with this proposition, we can express (5.3) in the form of an appropriate 
rate-distortion problem by fixing tt and considering the dual value for (5.5): 

J* Tr{R) — sup inf (tt (g) d), c -I- Qh — h, (g) 1). (5.8) 

/ieC6(x) 


Proposition 5.4. Suppose that assumption (A.l) above is satisfied, and that 
J^{R) < oo. Then the primal value J^{R) and the dual value J*^7r(7?) are equal. 

Proof Let V^ fiR) C PiX x U) be the closure, in the weak topology, of the set 

of all r G V(X X U), such that r(- x U) = 7r(-), I{T) < R, and (r,c) < Jf^{R). Since 

J^{R) < 00 by hypothesis, we can write 

J*(R) = inf sup (r, c -I- Qh — h (g) 1) (5.9) 

r6PS,c(«) hGCtix.) 

and 

J„t^{R)= sup inf (V,c +Qh — h®l). (5.10) 

/«6C6(X) rG-PS,,(R) 


Because c is coercive and nonnegative, and J^{R) < oo, the set {T G V{X x U) : 
(r,c) < J*{R)} is tight [15, Proposition 1.4.15], so its closure is weakly sequentially 
compact by Prohorov’s theorem. Moreover, because the function P i—>■ /(P) is weakly 
lower semicontinuous [25], the set {T : /(P) < R} is closed. Therefore, the set fiR) 
is closed and tight, hence weakly sequentially compact. Moreover, the sets fiR) 
and Cb(X) are both convex, and the objective function on the right-hand side of (5.9) 
is afhne in T and linear in h. Therefore, by Sion’s minimax theorem [31] we may 
interchange the supremum and the infimum to conclude that J^{R) = J*^t:{R). □ 

We are now in a position to relate the optimal value J^{R) = J*, 7 r(i?) to a suitable 
rate-distortion problem. Recalling the definition in Eq. (4.3), for any h G CbiX) we 
consider the DRF of tt w.r.t. the distortion function c -I- Qh: 

Dt^(R]c +QK) = inf (tt g) <&, c-I-Qh). (5.11) 

We can now give the following structural result: 

Theorem 5.5. Suppose that Assumptions (A.1)-(A.3) of Theorem 5.2 are in 
force. Consider a probability measure tt G VfX) such that J^{R) < oo, and the 
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supremum over h G CbO^) in (5.8) is attained by some Define the critical rate 
= min < i? > 0 : c + Qh-,^) = ( tt, inf [c(-, u) + u)] 

\ nGU 

IfR thiGTl ih/GT'G GXZSts (XTL COTltT'ol Ici'lV $ ^ SUch thdt I(^7Tj $ ^ — 

R, and the Radon-Nikodym derivative a/ tt (8) w.r.t. tt (8) 7r$* takes the form 

fa „> = (512, 

where d{x, u) = c{x, u) + Qh.,r{x, u), and s > 0 satisfies 

DTr{R';c +Qh^r) + sR'> DTr{R-,c +Qh-jr) + sR, Vi?'> 0. (5.13) 

If R > Rq.tt, then the deterministic Markov policy <i>*(dii|a;) = 6^* i^^){du), where 
u%{x) is any minimizer of c{x,u) + QhT^{x,u) over u, satisfies /(tt, $*) = i?o, 7 r- In 
both eases, we have 

J*{R) + (tt, /i,r) = (tt (g) <&*, c + Q/i,r) = Dt,[R-, c + Qh^,). (5.14) 

Moreover, the optimal value J^{R) admits the following variational representation: 



Jf,{R) = sup sup inf < — (tt,/ i) 

s>0/tGC6(X) J"G-P(U) I 


TT, log 


Ju e (['=('>“)+'3^('’“)li/(du) 


-R 


(5.15) 


Proof Using Proposition 5.4 and the definition (5.8) of the dual value J*^,n.(i?), 
we can express J^{R) as a pointwise supremum of a family of DRF’s: 

J*{R) = sup [Dt,{R;c+Q h) - {Tr,h)]. (5.16) 

heCbix) 

Since J^{R) < oo, we can apply Proposition 4.1 separately for each h G C'b(X). Since 
Q is weakly continuous by hypothesis, Qh G Ct,(X x U) for any h G C'b(X). In light 
of these observations, and owing to our hypotheses, we can ensure that Assump¬ 
tions (D.l) and (D.2) of Proposition 4.1 are satisfied. In particular, we can take 
/ijr G Ch(X) that achieves the supremum in (5.16) (such an /i,^. exists by hypothe¬ 
sis) to deduce the existence of an MRS control law that satisfies the information 
constraint with equality and achieves (5.14). Using (4.4) with 

we obtain (5.12). In the same way, (5.13) follows from (4.6) in Proposition 4.1. Finally, 
the variational formula (5.15) for the optimal value can be obtained immediately from 
(5.16) and Proposition 4.2. □ 

Note that the control law 4)* G AI(U|X) characterized by Theorem 5.5 is not 
guaranteed to be feasible (let alone optimal) for the optimization problem in Eq. (5.4). 
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However, if we add the invariance condition $* G JCt^, then (5.14) provides a sufficient 
condition for optimality: 

Theorem 5.6. Fix a candidate invariant distribution tt G ’P(X). Suppose there 
exist G A^r < oo, and a stochastic kernel 4)* G K.t^{R) such that 

(tt, = (tt (g) $*, c + c + Qh^,). (5.17) 

Then 4)* G A1(U|X) achieves the infimum in (5.4), and J^{R) = Jt,Tr{R) = Att- 

Proof. First of all, using the fact that 4>* G /C,r together with (5.17), we can write 

(tt (g) 4)*, c) = (tt (g) 4)*, C + Qh^r — /l,r ® 1) = A^r (5.18) 

From Proposition 5.3 and (5.17) we have 

J* (R) = inf sup (tt (g) 4>, c + Qh — h) 

<i>ei,r(R) ;igLi(7r) 

> inf (tt (g) 4), c + Q/i^r — 

— TI-k(.Ri c “t“ (tt, h-Tr) 


On the other hand, since 4)* G we also have 

J*{R)= inf sup (tt (g) 4>, c + Q/i — h) 

<I>eI,r(R) h,gLi(7r) 

< sup (tt 0 4>*, c + Qh — h) 

(tt) 

= (tt 0 4)*, c) 


where the last step follows from (5.18). This shows that (tt 0 4>*,c) = = J*{R), 

and the optimality of 4)* follows. □ 

To complete the computation of the optimal steady-state value J*{R) defined in 
(5.3), we need to consider all candidate invariant distributions tt G ’P(X) for which 
1Ct:{R) is nonempty, and then choose among them any tt that attains the smallest 
value of Ji^{R) (assuming this value is finite). On the other hand, if J^{R) < oo for 
some TT, then Theorem 5.5 ensures that there exists a suboptimal control law satisfying 
the information constraint in the steady state. 

5.3. Information-constrained Bellman equation. The function h^^ that ap¬ 
pears in Theorems 5.5 and 5.6 arises as a Lagrange multiplier for the invariance 
constraint 4> G JCt^. For a given invariant measure tt G ’P(X), it solves the fixed-point 
equation 


(tt, h) -I- A,r 


inf (tt 0 4), c -I- Qh) 


(5.19) 


with A^ = J*{R). 

In the limit R ^ oo (i.e., as the information constraint is relaxed), while also 
minimizing over the invariant distribution tt, the optimization problem (5.3) reduces 
to the usual average-cost optimal control problem (5.2). Under appropriate con¬ 
ditions on the model and the cost function, it is known that the solution to (5.2) 
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is obtained through the associated Average-Cost Optimality Equation (ACOE), or 
Bellman Equation (BE) 


h{x) -I- A = inf [c{x, u) + Qh{x, u)], (5.20) 

with X = J*. The function h is known as the relative value function, and has the 
same interpretation as a Lagrange multiplier. 

Based on the similarity between (5.19) and (5.20), we refer to the former as 
the Information-Constrained Bellman Equation (or IC-BE). However, while the BE 
(5.20) gives a fixed-point equation for the relative value function h, the existence of a 
solution pair (/itt, A.^.) for the IC-BE (5.19) is only a sufficient condition for optimality. 
By Theorem 5.6, the Markov kernel 4)* that achieves the infimum on the right-hand 
side of (5.19) must also satisfy the invariance condition $* G JC-rriR), which must be 
verified separately. 

In spite of this technicality, the standard BE can be formally recovered in the limit 
i? —>■ oo. To demonstrate this, first observe that Jf-{R) is the value of the following 
(dual) optimization problem: 

maximize A 

subject to s ( TT, log - rn — , —r;-- \ > A -I- sE, Viy G 'P(U) 

\ /ue-^W-'“)+0''(-’“)li/(du) s/- ^ ^ 

A > 0, s > 0, h £ 

This follows from (5.15). From the fact that the DRF is convex and nonincreasing in 
R, and from (5.13), taking i? —>■ oo is equivalent to taking s —> 0 (with the convention 
that sR —>■ 0 as i? —>■ oo). Now, Laplace’s principle [12] states that, for any i/ G 7^(U) 
and any measurable function F : U —)■ K such that e~^ G T^(iz), 

— limslog / e“ = ^^“V(du) = iz-essinfF( m). 

Ju «GU 

Thus, the limit of J^{R) as i? —^ oo is the value of the optimization problem 


maximize A 


subject to ( TT, inf [c(-, u) Qh(-, u)] — h) > X, 

' mGU ' 


A > 0, h G L^{'k) 


Performing now the minimization over tt G ’P(X) as well, we see that the limit of 
J*{R) as i? —>■ oo is given by the value of the following problem: 


maximize A 

subject to inf [c(-,m) -I- Qh{-,u)] — h> X, X> 0, h G C'(X) 

which recovers the BE (5.20) (the restriction to continuous h is justified by the fact 
that continuous functions are dense in L^{Tr) for any finite Borel measure tt). We 
emphasize again that this derivation is purely formal, and is intended to illustrate 
the conceptual relation between the information-constrained control problem and the 
limiting case as i? —>■ oo. 
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5.4. Convergence of mutual information. So far, we have analyzed the 
steady-state problem (5.3) and provided sufficient conditions for the existence of a 
pair (tt, d)*) S 7^(X) x /C, such that 

= inf 4($) Pf-a.s. and (5.21) 

(here, i? is a given value of the information constraint). Turning to the average-cost 
problem posed in Section 3, we can conclude from (5.21) that solves (3.2) in the 
special case /i = tt. In fact, in that case the state process {Xt} is stationary Markov 
with fit = ljaw{Xt) = TT for all t, so we have I{fit,^*) = /(tt, $*) = R for all t. 
However, what if the initial state distribution fi is different from tt? 

For example, suppose that the induced Markov kernel G Ad(X|X) is weakly 
ergodic, i.e., fit converges to tt weakly for any initial state distribution fi. In that 
case, fit ® TT 0 weakly as well. Unfortunately, the mutual information 

functional is only lower semicontinuous in the weak topology, which gives 


liminf /(^t, $*) > /(tt, $*) = R. 


That is, while it is reasonably easy to arrange things so that J^(<1?*) = J^{R) a.s., the 
information constraint (3.2b) will not necessarily be satisfied. The following theorem 
gives one sufficient condition: 

Theorem 5.7. Fix a probability measure p G ’P(X) and a stable MRS control law 
4* G A1(U|X), and let {{Xt,Ut)}'^i be the corresponding state-action Markov process 
with Xi ~ p. Suppose the following conditions are satisfied: 

(1.1) The induced transition kernel is aperiodic and positive Harris recurrent 
(and thus has a unique invariant probability measure tt = ttQ ^). 

(1.2) The sequence of information densities 


it{x,u) 


A 


log 


d{pt 0 4») 
d{pt 0 Pt^) 


{x,u), 


t > 1 


where pt = P)'^ {Xt G ■), is uniformly integrable, i.e., 


lim sup'E'j^ [it{Xt,Ut)l{ipxt,Ut)>N}] — 0. 


N—^oo 


t>l 


(5.22) 


Then /(^it, $) ^^/(7r,$). 

Proof. Since is aperiodic and positive Harris recurrent, the sequence pt con¬ 
verges to TT in total variation (see [21, Thm. 13.0.1] or [15, Thm. 4.3.4]): 

ll/it-7rl|Tv= sup l^t(H) - 7r(A)l 0. 

AgB(X) 


By the properties of the total variation distance, \\pt ® $ — tt (g) $jjTv 0 as 

well. This, together with the uniform integrability assumption (5.22), implies that 
/(/Tt, $*) converges to /(tt, 4>*) by a result of Dobrushin [11]. □ 

While it is relatively easy to verify the strong ergodicity condition (I.l), the 
uniform integrability requirement ( 1 . 2 ) is fairly stringent, and is unlikely to hold 
except in very special cases: 

Example 1. Suppose that there exist nonnegative cr-finite measures A on (X, BfK)) 
and p on (U,;B(U)), such that the Radon-Nikodym derivatives 


P{x) = 


/(^ 


d<i> 

= -^{u\x), 

dp 


9iy\x) = '^(yja::) 


dA 


(5.23) 
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exist, and there are constants c, C > 0, such that c < f{u\x) < C for all x G X, m S U. 
(This boundedness condition will hold only if each of the conditional probability 
measures $(-|x),x G X, is supported on a compact subset Sx of U, and p{Sx) is 
uniformly bounded.) Then the uniform integrability hypothesis (1.2) is fulfilled. 

To see this, we first note that, for each t, both /ij 0 $ and pt 0 are absolutely 
continuous w.r.t. the product measure A 0 p, with 


d(/it <S> $) 
d(A (g) p) 


{x,u) 


Pt{x)f{u\x) and 


d(pt (g)pt$) 

d(A (g) p) 


{x,u) 


Pt{x)qt{u), 


where pi = p, and for t > 1 


Pt+i{x) = = j Pt{x')g{x\x')\{dx'), 

qt{u) = = J^Pt{x)f{u\x)X{dx). 

This implies that we can express the information densities it as 

It{x, u) = log (x, u) G X X U, t = 1,2,.... 

qt{u) 

We then have the following bounds on zp 

log < it{x,u) < log f{u\x) - J pt{x) log f{u\x)X{dx) < log 

where in the upper bound we have used Jensen’s inequality. Therefore, the sequence of 
random variables {it{Xt,Ut)}^i is uniformly bounded, hence uniformly integrable. 

In certain situations, we can dispense with both the strong ergodicity and the 
uniform integrability requirements of Theorem 5.7: 

Example 2. Let X = U = K. Suppose that the control law $ can be realized as 
a time-invariant linear system 

Ut = kXt + Wt, f = l,2,... (5.24) 

where fc G M is the gain, and where {Wt}^i is a sequence of i.i.d. real-valued random 
variables independent of Xi, such that ly = Law(IEi) has finite mean m and variance 
and satishes 



D{i'\\N{m,a'^)) < oo, (5.25) 

where N{m,a'^) denotes a Gaussian probability measure with mean m and variance 
cr^. Suppose also that the induced state transition kernel with invariant distribu¬ 
tion TT is weakly ergodic, so that /i* —>■ tt weakly, and additionally that 


lim 

>-oo 



{pt,x)fpt{dx) 



(7r,x))^7r(dx), 


i.e., the variance of the state converges to its value under the steady-state distribution 
TT. Then I{pt, d>) —>■ /(tt, <I>) as an immediate consequence of Theorem 8 in [41]. 
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6. Example: information-constrained LQG problem. We now illustrate 
the general theory presented in the preceding section in the context of an information- 
constrained version of the well-known Linear Quadratic Gaussian (LQG) control prob¬ 
lem. Consider the linear stochastic system 


Xt+i — o.Xt + hUt -\- Wt^ t> 1 (6-1) 

where a,b 0 are the system coefficients, is a real-valued state process, 

{Ut}^i is a real-valued control process, and is a sequence of i.i.d. Gaussian 

random variables with mean 0 and variance cr^. The initial state Xi has some given 
distribution fjL. Here, X = U = K, and the controlled transition kernel Q C At(X|Xx U) 
corresponding to (6.1) is Q{dy\x,u) = ^{y,ax + bu,a^)dy, where 7 (-;m, tr^) is the 
probability density of the Gaussian distribution N(m,a‘^), and dy is the Lebesgue 
measure. We are interested in solving the information-constrained control problem 
(3.2) with the quadratic cost c{x, u) = px'^ + qu^ for some given p,q > 0. 

Theorem 6.1. Suppose that the system (6.1) is open-loop stable, i.e., < 1. 
Fix an information constraint R > 0. Let mi = mi(R) be the unique positive root of 
the information-constrained discrete algebraic Riccati equation (IC-DARE) 


p -L m{a‘^ - 1 ) + - 1 ) = 0 , 

q mb^ 

and let m 2 be the unique positive root of the standard DARE 

q d- mb^ 

Define the control gains ki = ki{R) and k 2 by 

mj.ab 

k, = -- 


q d- mih^ 

and steady-state variances of = cfI{R) and cr| = cr|(i?) by 


al = 


1 - 


g-2R^2 -I- (1 — e“2^) (a -I- bkiY 


( 6 . 2 ) 


(6.3) 


(6.4) 


(6.5) 


Then 

J*{R) < min m2cr^ + (9 + m2b‘^)k2ale~^^'^ . (6.6) 

Also, let $1 and $2 be two MRS control laws with Gaussian conditional densities 

ipi{u\x) = = 7 (m; (1 - e~'^^)hx, (1 - e~'^^)e~'^^ki<Tl) , (6.7) 

dit 

and let = N{0, af) for i = 1,2. Then the first term on the right-hand side of ( 6 . 6 ) 
is achieved by d*!, the second term is achieved by $ 2 , and < 1 )^ C for i = 1 , 2 . 

In eaeh case the information constraint is met with equality: I{TTi,^i) = R, i = 1,2. 

To gain some insight into the conclusions of Theorem 6.1, let us consider some 
of its implications, and particularly the cases of no information (i? = 0 ) and perfect 
information (i? = -foo). First, when R = 0, the quadratic IC-DARE (6.2) reduces to 
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the linear Lyapanov equation [9] p + m{a^ — 1) = 0, so the first term on the right-hand 
side of (6.6) is mi(0)(T^ = other hand, using Eqs. (6.3) and (6.4), we 

can show that the second term is equal to the first term, so from (6.6) 

( 6 . 8 ) 

1 — 

Since this is also the minimal average cost in the open-loop case, we have equality in 
(6.8). Also, both controllers $1 and ^2 are realized by the deterministic open-loop 
law Ut = 0 for all t, as expected. Finally, the steady-state variance is crJ(O) = ctKO) = 
and TTi = 7r2 = N(0,a'^/(1 — a^)), which is the unique invariant distribution of 
the system (6.1) with zero control (recall the stability assumption < 1). Second, 
in the limit i? —>■ 00 the IC-DARE (6.2) reduces to the usual DARE (6.3). Hence, 
TOi(oo) = 7712, and both terms on the right-hand side of (6.6) are equal to m 2 <J^: 

J* ( 00 ) < r772cr^. (6.9) 


Since this is the minimal average cost attainable in the scalar LQG control problem 
with perfect information, we have equality in (6.9), as expected. The controllers $1 
and $2 are again both deterministic and have the usual linear structure Ut = k 2 Xt 
for all t. The steady-state variance cri(oo) = tT|(oo) = is equal to the 

steady-state variance induced by the optimal controller in the standard (information- 
unconstrained) LQG problem. 

When 0 < i? < 00 , the two control laws $1 and 4)2 are no longer the same. 
However, they are both stochastic and have the form 


Ut = k, 


(1 - e-'^^)Xt + 


( 6 . 10 ) 


where Vi^\V 2 ^\ ... are i.i.d. N{0, erf) random variables independent of {Wt}“i and 
Xi. The corresponding closed-loop system is 


Xt+i — [a -|- (1 — e bki] Xt z[ \ 


( 6 . 11 ) 


(0 (0 • • 

where \ ^ ... are i.i.d. zero-mean Gaussian random variables with variance 


a? = 


(1 - e ibki)'^ af + . 


Theorem 6.1 implies that, for each i = 1,2, this system is stable and has the invari¬ 
ant distribution = N{Q,af). Moreover, this invariant distribution is unique, and 
the closed-loop transition kernels Q$., i = 1,2, are ergodic. We also note that the 
two controllers in (6.10) can be realized as a cascade consisting of an additive white 
Gaussian noise (AWGN) channel and a linear gain: 

Ut = hxf\ Xf^ = (1 - e-2«)W + e-^x/l - 

We can view the stochastic mapping from Xt to Xf ' as a noisy sensor or state 
observation channel that adds just enough noise to the state to satisfy the information 
constraint in the steady state, while introducing a minimum amount of distortion. 
The difference between the two control laws $1 and $2 is due to the fact that, for 
0 < i? < 00 , ki{R) ^ k 2 and cri(i?) yf o-|(i?). Note also that the deterministic 
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(linear gain) part of $2 is exactly the same as in the standard LQG problem with 
perfect information, with or without noise. In particular, the gain ^2 is independent of 
the information constraint R. Hence, <i >2 as a certainty-equivalent control law which 
treats the output ' of the AWGN channel as the best representation of the state 
Xt given the information constraint. A control law with this structure was proposed 
by Sims [28] on heuristic grounds for the information-constrained LQG problem with 
discounted cost. On the other hand, for <i)i both the noise variance a\ in the channel 
Xt — >■ Xt^'^ and the gain ki depend on the information constraint R. Numerical 
simulations show that <i>i attains smaller steady-state cost for all sufficiently small 
values of R (see Figure 6.1), whereas $2 outperforms $1 when R is large. As shown 
above, the two controllers are exactly the same (and optimal) in the no-information 
(i? —> 0) and perfect-information (R —>■ 00 ) regimes. 



Fig. 6.1. Comparison of^i and $2 o,t low information rates and the difference $2 — ^1 (dashed 
line). System parameters: a = 0.995, 5 = 1, cr^ = 1, cost parameters: p = q = 1. 


In the unstable case (a^ > 1), a simple sufficient condition for the existence of an 
information-constrained controller that results in a stable closed-loop system is 


R> fog 


a'^ — {a 6 ^ 2 )^ 
1 - (a -I- 6^2)^ 


( 6 . 12 ) 


where ^2 is given by (6.4). Indeed, if R satisfies (6.12), then the steady-state variance 
cr| is well-defined, so the closed-loop system ( 6 . 11 ) with i = 2 is stable. 

6.1. Proof of Theorem 6.1. We will show that the pairs {hi,Xi) with 
hi{x) = mix^, Ai = TOicr^ 

h2{x) = m2x‘^, X 2 = TO2cr^ + (<? + m2&^)fc|cr|e“^^ 


both solve the IG-BE (5.19) for tt^, i.e., 

{TTi,hi)Xi = Dt,^{R]c-\-Q hi), (6.13) 


and that the MRS control law achieves the value of the distortion-rate function 
in (6.13) and belongs to the set K.t^^{R). Then the desired results will follow from 
Theorem 5.6. We split the proof into several logical steps. 

Step 1: Existence, uniqueness, and closed-loop stability. We first demonstrate 
that mi = mi{R) indeed exists and is positive, and that the steady-state variances 
cr^ and cr| are finite and positive. This will imply that the closed-loop system (6.11) 






20 


SHAFIEEPOORFARD, RAGINSKY, MEYN 


is stable and ergodic with the unique invariant distribution tt^. (Uniqueness and 
positivity of m 2 follow from well-known results on the standard LQG problem.) 

Lemma 6.2. For all a,b ^ 0 and all p,q,R > 0, Eq. (6.2) has a unique positive 
root mi = mi(R). 

Proof. It is a straightforward exercise in calculus to prove that the function 
F(m) ^p + ma^ + - 1). 

is strictly increasing and concave for m > —q/b'^. Therefore, the fixed-point equation 
F{m) = m has a unique positive root mi{R). (See the proof of Proposition 4.1 in [5] 
for a similar argument.) □ 

Lemma 6.3. For all a,b ^ 0 with of < \ and p,q,R > 0, 

-I-(1 — e“^'^)(a-I-G (0,1), i = l,2. (6.14) 


Thus, the steady-state variance af = (jf[R) defined in (6.5) is finite and positive. 
Proof. We write 


-|-(l-e -i-bki)"^ = e 


+ (l-e-2«) 


a 1 — 




n 2 


< 


where the second step uses (6.4) and the last step follows from the fact that g > 0 
and mi > 0 (cf. Lemma 6.2). We get (6.14) from open-loop stability (a^ < 1). □ 

Step 2: A quadratic ansatz for the relative value function. Let h{x) = mx^ for an 
arbitrary m > 0. Then 


Qh{x, u) 


h{y)Q{dy\x, u) = m{ax bu^ + mtr^ 


(6.15) 


and 


c{x, u) Qh{x, u) = ma^ + {q + mb"^) {u — x)'^ + (pF maf — ^ 1 2 :^, 

\ q + mb^ ) 

TTlClb 

where we have set x = -Therefore, for any tt G ’P(X) and any $ G 

q + mb^ 

A1(U|X), such that tt and 7r$ have finite second moments, we have 


(tt (g) $, c -I- Qh — h) = mcr^ + {p + m(a^ — 1) — 


[mabY \ 

q F mb"^ J 


x^'K{dx) 


F[q + mb'^) / (u — i)^7r(dx)$(dM|a:). 
JXxU 


Step 3: Reduction to a static Gaussian rate-distortion problem. Now we consider 
the Gaussian case tt = N{0,v) with an arbitrary u > 0. Then for any $ G A1(U|X) 


(tt (g) $, c -I- Qh — h) 

= ma^ + {p + m{a^ ~ 1) ~ 


{mabY 
q F mh^ 


vF{qFmb'^) / (u — i)^7r(da:)<i)(d'u|a;). 
JXxU 
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We need to minimize the above over all $ S If X is a random variable with 

distribution tt = N{0,v), then its scaled version 


a: = - 


mab 

+ 


X = kX 


(6.16) 


has distribution tt = N{0,v) with v = k^v. Since the transformation X i-A X is one- 
to-one and the mutual information is invariant under one-to-one transformations [25], 

D.^{R;c +Qh) — (TTjh) = inf (tt (Si ^,c +Qh — h) (6-17) 

2 f / 2 (mab)'^ \ 

= ma -I- -I- m{a — I)--r v 

\ q + mb^ ) 


-I- ((? -I- mb^) inf 



i)^7r(di)l>(du|i). (6.IS) 


We recognize the infimum in (6.18) as the DRF for the Gaussian distribution tt w.r.t. 
the squared-error distortion d(x,u) = {x — u)^. (See Appendix B for a summary of 
standard results on the Gaussian DRF.) Hence, 

c + Qh) - (tt, h) 

= TOcr^ + + m(a^ — 1) — (^Q^)^^ ^ y ^q_^_ mb‘^)ve~‘^^ 

= TOcr^ + + m{a?' — 1) + (e~^^ — 1)^ v (6.19) 

= TOcr^ + + m(a^ — 1) — (^Q^)^^ ^ y ^q_^_ mb'^)k'^ve~^^, (6.20) 

where Eqs. (6.19) and (6.20) are obtained by collecting appropriate terms and using 
the definition of k from (6.16). We can now state the following result: 

Lemma 6.4. Let = iV(0, tr^), i = 1,2. Then the pair (hi,\i) solves the 
information-constrained ACOE (6.13). Moreover, for each i the controller defined 
in (6.7) achieves the DRF in (6.13) and belongs to the set 1 Ct^^{R). 

Proof. If we let m = toi, then the second term in (6.19) is identically zero for any 
V. Similarly, if we let m = m 2 , then the second term in (6.20) is zero for any v. In 
each case, the choice v = af gives (6.13). From the results on the Gaussian DRF (see 
Appendix B), we know that, for a given > 0, the infimum in (6.18) is achieved by 

A:*(dM|i) = 7 (u; (1 - e-2«)i, e-^^{l - e-^^)v) du. 

Setting r; = cr| for i = 1, 2 and using x = kiX and v = kfaf, we see that the infimum 
over $ in (6.17) in each case is achieved by composing the deterministic mapping 

~ l. lRO^\ 

X = kiX = - zT,x (6.21) 

q + rriiO^ 

with K*. It is easy to see that this composition is precisely the stochastic control 
law defined in (6.7). Since the map (6.21) is one-to-one, we have I{TTi,^i) = 
I{TTi,K*) = R. Therefore, S (R). 

It remains to show that 4>i S he., that is an invariant distribution of <5$^. 
This follows immediately from the fact that Qg,. is realized as 


Y = {a + bhe-'^^)X + bhe-^^/l - + W, 
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where ~ A^(0, erf) and W ^ N(0,a^) are independent of one another and of X 
[cf. (B.3)]. If AT ~ TTi, then the variance of the output V is equal to 



where the last step follows from (6.5). This completes the proof of the lemma. □ 
Putting together Lemmas 6.2-6.4 and using Theorem 5.6, we obtain Theorem 6.1. 


7. Infinite-horizon discounted-cost problem. We now consider the problem 
of rationally inattentive control subject to the infinite-horizon discounted-cost crite¬ 
rion. This is the setting originally considered by Sims [28,29]. The approach followed 
in that work was to select, for each time t, an observation channel that would provide 
the best estimate Xt of the state Xt under the information constraint, and then in¬ 
voke the principle of certainty equivalence to pick a control law that would map the 
estimated state to the control Ut, such that the joint process {{Xt,Xt,Ut)} would 
be stationary. On the other hand, the discounted-cost criterion by its very nature 
places more emphasis on the transient behavior of the controlled process, since the 
costs incurred at initial stages contribute the most to the overall expected cost. Thus, 
even though the optimal control law may be stationary, the state process will not 
be. With this in mind, we propose an alternative methodology that builds on the 
convex-analytic approach and results in control laws that perform well not only in the 
long term, but also in the transient regime. 

In this section only, for ease of bookkeeping, we will start the time index at t = 0 
instead of t = 1. As before, we consider a controlled Markov chain with transition 
kernel Q S A4(X|X, U) and initial state distribution /r G 7^(X) of Xq. However, we now 
allow time-varying control strategies and refer to any sequence $ = of Markov 

kernels G AI(U|X) as a Markov randomized (MR) control law. We let denote 
the resulting process distribution of {(A^, C7t)}“Q, with the corresponding expectation 
denoted by E*. Given a measurable one-step state-action cost c : X x U —?> K and a 
discount factor 0 < /3 < 1, we can now define the infinite-horizon discounted cost as 


OO 




.t=o 


Any MRS control law 4) G AI(U|X) corresponds to having $( = <!> for all t, and in 
that case we will abuse the notation a bit and write P)^, E*, and J^($). In addition, 
we say that a control law $ is Markov randomized quasistationary (MRQ) if there 
exist two Markov kernels g A4(U|X) and a deterministic time to G Z_|_, such 

that is equal to for t < to and for t > to- 

We can now formulate the following information-constrained control problem: 


minimize 

subject to /(/it, 4)t) < R, Vt > 0. 


(7.1a) 

(7.1b) 


Here, as before, /if = P*[Af G •] is the distribution of the state at time t, and the 
minimization is over all MRQ control laws €>. 

7.1. Reduction to single-stage optimization. In order to follow the convex- 
analytic approach as in Section 5.1, we need to write (7.1) as an expected value of 
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the cost c with respect to an appropriately defined probability measure on X x U. In 
contrast to what we had for (3.2), the optimal solution here will depend on the initial 
state distribution /r. We impose the following assumptions: 

(D.l) The state space X and the action space U are compact. 

(D.2) The transition kernel Q is weakly continuous. 

(D.3) The cost function c is nonnegative, lower semicontinuous, and bounded. 

The essence of the convex-analytic approach to infinite-horizon discounted-cost opti¬ 
mal control is in the following result [8]: 

Proposition 7.1. For any MRS control law <i> S AI(U|X), we have 


where P(X X U) is the discounted occupation measure, 


(rl$,/> = (i-/3)E: 




V/G 


defined hy 
Cb{X X U). 


.t=o 

This measure can be disintegrated as T^ = tt 0 $, where tt S 'P(X) is the 
solution of the equation 

TT = (1 - fi)ii + /37rQ$. 


(7.2) 


unique 


(7.3) 


It is well-known that, in the absence of information constraints, the minimum of 
J)^($) is achieved by an MRS policy. Thus, if we define the set 

— |r = TT (g) e P(X X U) : TT = (I - P)yL + ,d7r(3$|, 

then, by Proposition 7.1, 

4* = inf (^) = ^ inf (P, c), (7.4) 

^ ^ I- (3 

and if P* = tt* g achieves the infimum, then $* gives the optimal MRS control 
law. We will also need the following approximation result: 

Proposition 7.2. For any MRS control law S AI(U|X) and any e > 0, there 
exists an MRQ control law such that 

4m<4m+e, (7.5) 

and 

t = 0,l,... (7.6) 

where /rf = P^'(Xt G •), tt G ’P(X) is given hy (7.3), and C = maxmaxc(a;, u). 

Proof. Given an MRS $, we construct as follows: 


$f(du|a:) 


<i>(du|a;), t <tt, 
(5^0 (du), t > t* ’ 
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where 


i* = min S N : ^ > (7-7) 

and Uq is an arbitrary point in U. For each t, let G •)• Then, 

using the Markov property and the definition (7.7) of t*, we have 


4m=^: 


Y,P^c{XuUt) 






^/3‘c(Xt,Mo) 






t^O 


< 4 m+e, 


which proves (7.5). To prove (7.6), we note that (7.2) implies that 

= 7r(g)«' = [(1-/3) X! /3VQ$ ) ® 


t^O 


Therefore, since the mutual information I{h'^K) is concave in z/, we have 


/(7r,$)>(l 

= (1 

>(1 

>(1 


OO 

/3)5]/?‘/(mQ^,$) 

t^O 

i* — 1 OO 

/3) ^ /3‘/(/if, 4>f) + (1 - /3) ^ 




i=0 






where we have also used the fact that the mutual information is nonnegative, as well 
as the definition of t*. This implies that, for t < t*, 


mm< 


7(7r,$) 

(l-/3)/3‘. 


-1 < 


C 


{i-^Y 


:/(7r,$). 


For t > t*, /(/rj,$f) = 0, since at those time steps the control Ut is independent of 
the state Xt by construction of □ 

As a consequence of Propositions 7.1 and 7.2, we can now focus on the following 
static information-constrained problem: 


minimize 


1 


(r,c) 


subject to F G 5^, /(F) < i? 


(7.8a) 

(7.8b) 


(the information constraint R will be related to the original value R later). We will 
denote the value of this optimization problem by J^*{R). 
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7.2. Marginal decomposition. We now follow more or less the same route as 
we did in Section 5.2 for the average-cost case. Given tt G 'P{X), let us define the set 

^ G M(U|X) + PttQ^} 


(this set may very well be empty, but, for example, We can then 

decompose the infimum in (7.4) as 


t/3» 


1 


1-/3 reef 


inf (r,c) = 


1 


inf inf (7rG)<i),c). 


1-/3 ^G-P(X) .tgKf 


(7.9) 


If we further define nl7r(77), then the value of the optimization 

problem (7.8) will be given by 

Jf(i?)= inf ^(i?), where 4;(i?) 4 ^ inf (^®$,c). (7.10) 

From here onward, the progress is very similar to what we had in Section 5.2, so we 
omit the proofs for the sake of brevity. We first decouple the condition $ G JC^ ^ from 
the information constraint $ G lTr{R) by introducing a Lagrange multiplier: 
Proposition 7.3. For any n gV{X), 

= 1^0 - sup [{tt +PQh-h®l) + {I - [i){yL,h)]. (7.11) 

1 - /3 <[.GI„(R) ?>6Cb(X) 

Since the cost c bounded, J^*^{R) < oo, and we may interchange the order of the 
infimum and the supremum with the same justification as in the average-cost case: 

Ja*T,iR) = . ^ a sup inf [{tt iS) ^,c + j3Qh - hiSi 1) + {I -/3){fJ.,h)] (7.12) 

1 - P/«gC6(x) <i>ei,(R) 

At this point, we have reduced our problem to the form that can be handled using 
rate-distortion theory: 

Theorem 7.4. Consider a probability measure tt G ’P(X), and suppose that the 
supremum over h G C'b(X) in (5.8) is attained by some h^ .^. Then there exists an 
MRS control law G A4(U|X) such that /(tt, $*) = R, and we have 

= ^Zl.(i?;c + /3Q<J. (7.13) 

Conversely, if there exist a function h^^ G A^(7r), a constant > 0, and a Markov 
kernel $* G IC^ „.{R), such that 


(7.14) 
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then and this value is achieved by T* = tt (g) $*. 

The gist of Theorem 7.4 is that the original dynamic control problem is reduced to 
a static rate-distortion problem, where the distortion function is obtained by perturb¬ 
ing the one-step cost c(x,u) by the discounted value of the state-action pair {x,u). 

Theorem 7.5. Given R> 0 and e > 0, suppose that Eq. (7.14) admits a solution 
triple with 


R = R{e,P) = 


( 1 -/ 3 )^£ 

C 


R. 


Let Qp,{R) denote the set of all MRQ control laws $ satisfying the information con¬ 
straint (7.1b). Then 


inf ($) < 




[D^ (i?(e, /3); c -f - (tt, h^^^)] + {p, h^^^) + e. 


(7.15) 


Proof Given $* and £ > 0, Proposition 7.2 guarantees the existence of a MRQ 
control strategy such that 

and I{pt,^t*) — LI for all t > 0. Thus, S Q^{R). Taking the inhmum over all 
S Q^{R) and using (7.14), we obtain (7.15). □ 


Appendices 

Appendix A. Sufficiency of memoryless observation channels. 

In Sec. 3, we have focused our attention to information-constrained control prob¬ 
lems, in which the control action Ut at each time t is determined only on the basis 
of the (noisy) observation Zt pertaining to the current state Xt. We also claimed 
that this restriction to memoryless observation channels entails no loss of general¬ 
ity, provided the control action at time t is based only on Zt (i.e., the information 
structure is amnesic in the terminology of [40] — the controller is forced to “forget” 
Zi,..., Zt-i by time t). In this Appendix, we provide a rigorous justification of this 
claim for a class of models that subsumes the set-up of Section 3. One should keep in 
mind, however, that this claim is unlikely to be valid when the controller has access 
to 

We consider the same model as in Section 3, except that we replace the model 
components (M.3) and (M.4) with 

(M.3’) the observation channel, specified by a sequence W of stochastic kernels Wt S 
M(Z|X* X X U*-Q, t = 1,2,...; 

(M.4’) the feedback controller, specified by a sequence $ of stochastic kernels G 
M(UjZ),t=l,2,.... 

We also consider a finite-horizon variant of the control problem (3.2). Thus, the 
DM’s problem is to design a suitable channel W and a controller $ to minimize the 
expected total cost over T < oo time steps subject to an information constraint: 


■ • • ie’4' w 

minimize ItL.x 


r T 


.t=i 


subject to I{Xt; Zt) < R, t = 1,2,... ,T 


(A.la) 


(A.lb) 
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The optimization problem (A.l) seems formidable: for each time step t = 1,..., T we 
must design stochastic kernels Wt{dzt\x^, and $i(dut|zt) for the observa¬ 

tion channel and the controller, and the complexity of the feasible set of WtS grows 
with t. However, the fact that (a) both the controlled system and the controller 
are Markov, and (b) the cost function at each stage depends only on the current 
state-action pair, permits a drastic simplification — at each time t, we can limit our 
search to memoryless channels Wt(dzt\xt) without impacting either the expected cost 
in (A.la) or the information constraint in (A.lb): 

Theorem A.l (Memoryless observation channels suffice). For any controller 
specification $ and any channel specification W, there exists another channel speci¬ 
fication W' consisting of stochastic kernels Wt{dzt\xt), t = 1,2,..such that 


E 


Ec(x(,c/;) 


= E 






and /(X(; Z() = /(X*; Z^), t = 1,2,... ,T 


where {{Xt,Ut, Zt)} is the original process with {p,,Q,W,^), while {XfiUfi Z'^)} is 
the one with (/i, Q,W , $). 

Proof. To prove the theorem, we follow the approach used by Wistenhausen 
in [39]. We start with the following simple observation that can be regarded as an 
instance of the Shannon-Mori-Zwanzig Markov model [23]: 

Lemma A.2 (Principle of Irrelevant Information). Let S, 0, d/, T be four random 
variables defined on a common probability space, such that T is conditionally indepen¬ 
dent of {Q,'E.) given dt. Then there exist four random variables S', 0', tk', T' defined 
on the same spaces as the original tuple, such that S' —>■ 0' —>■ dt' —> T' is a Markov 
chain, and moreover the bivariate marginals agree: 


Law(.=,, 0) = Law(.^', 0'), Law(0, d>) = Law(0', vk'), Law(d>, T) = Law(tk', T'). 


Proof. If we denote by M{dv\ip) the conditional distribution of T given and by 
A(d'!/'16*, ^) be the conditional distribution of T given (0,^), then we can disintegrate 
the joint distribution of 0, S, \k, T as 

P{d0, d^, dfi, du) = P(d6»)P(d^|6»)A(dV’16», OM{dv\fi). 

If we define A'(dV’|6') by A'(-|6*) = / A(-|0,^)P(d^|0), and let the tuple (0', S', dt', T') 
have the joint distribution 

P\de,d^,dfi,dv) = P{d9)P{d^\9)A'{dfi\e)M{dv\'tjj), 

then it is easy to see that it has all of the desired properties. □ 

Using this principle, we can prove the following two lemmas: 

Lemma A.3 (Two-Stage Lemma). Suppose T = 2. Then the kernel 
W 2 {dz 2 \x'^, Zi,ui) can be replaced by another kernel W! 2 {dz 2 \x 2 ), such that the re¬ 
sulting variables {X[, Z[, U'fi), t = 1, 2, satisfy 

E[c(X(, Ui) + c(X', [/')] = E[c(Xi, Ui) + ciX 2 ,U 2 )] 

and I{X[-Z[) = I{Xt-,Zt),t = 1,2. 

Proof Note that Zi only depends on Xi, and that only the second-stage expected 
cost is affected by the choice of W 2 . We can therefore apply the Principle of Irrelevant 
Information to 0 = X 2 , 5 = {Xi, Zi,Ui), dt = Z 2 and T = U 2 . Because both the 
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expected cost E[c(Xt, Ut)] and the mutual information I{Xt\ Zt) depend only on the 
corresponding bivariate marginals, the lemma is proved. □ 

Lemma A. 4 (Three-Stage Lemma). Suppose T = 2, and is conditionally 
independent of {Xi, Zi,Ui), i = 1,2, givenX^. Then the kernel W 2 {dz 2 \x‘^, zi,ui) can 
be replaced by another kernel W 2 {dz 2 \x 2 ), such that the resulting variables (AT', Z', U^), 
i = 1, 2, 3, satisfy 


E 


Y.c{XfU',) 




= E 


■ 3 

J2ciXt,Ut) 




and I{Xi;Zi) = I{Xt; Zt) for <=1,2,3. 

Proof Again, Zi only depends on Xi, and only the second- and the third-stage 
expected costs are affected by the choice of W 2 ■ By the law of iterated expectation. 


E[c(A 3, t/3)] = E[E[c(A 3, [/3)|X2, t/2]] = E[/i(A 2, t/2)], 


where the functional form of h{X 2 ,U 2 ) = E[c(A 3 , C/ 3 )|A 2 , C/ 2 ] is independent of the 
choice of W 2 , since for any fixed realizations A 2 = X 2 and U 2 = U 2 we have 


h{x2,U2) 


J c{X3,U3)P{dX3,dU3\x2,U2) 


J c{x3,U3)Q{dX3\X2,U2)W3{dZ3\X3)<^3{dU3\ d2:3). 


by hypothesis. Therefore, applying the Principle of Irrelevant Information to 0 = At, 
5 = (Ai,Zi, C/i), ^ = Z 2 , and T = U 2 , 

E[c(A', C/') + c(A', C/')] = E[c(A', C/') + /r(A', C/')] 

= E[c(A2,C/2)+M^2,«72)] 

= E[c(A 2 ,C/ 2 )+c(A 3 ,C 73 )], 


where the variables {Xt,Z't,U't) are obtained from the original ones by replacing 
W 2 {dz 2 \x‘^,zi,ui) by 1 ^ 2 (^ 2212 : 2 ). □ 

Armed with these two lemmas, we can now prove the theorem by backward induction 
and grouping of variables. Fix any T. By the Two-Stage-Lemma, we may assume 
that Wt is memoryless, i.e., Zt is conditionally independent of A^“^, C/^“^ 

given Xt- Now we apply the Three-Stage Lemma to 


X^-\Z^-\U^-\Xt-2 , Z^ ,1^ 

Stage 1 Stage 1 Stage 1 

state observation control 


At-1, Zt-1 


Stage 2 Stage 2 
state observation 



Stage 2 
control 



Stage 3 
state 



Stage 3 
observation 


Ut 

Stage 3 


(A.2) 


to replace WT-i{dzT-i\x'^~^, with W!j,_-t{dzT-i\xT-i) without affecting 

the expected cost or the mutual information between the state and the observation 
at time T — 1. We proceed inductively by merging the second and the third stages in 
(A.2), splitting the first stage in (A.2) into two, and then applying the Three-Stage 
Lemma to replace the original observation kernel Wt -2 with a memoryless one. □ 

Appendix B. The Gaussian distortion-rate function. 
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Given a Borel probability measure /i on the real line, we denote by D^{R) its 
distortion-rate function w.r.t. the squared-error distortion d{x,x') = (x — x')"^'. 

D^{R)= inf [ {x-x')^fi{dx)K{dx'\x) (B.l) 


Let ^ = 7V(0,cr^). Then we have the following [4]: the DRF is equal to D^{R) = 
i^ 2 g- 2 _R; optimal kernel K* that achieves the infimum in (B.l) has the form 

K*{dx'\x) = 7 (a;'; (1 - e-‘^^)x, (1 - dx'. (B.2) 

r Moreover, it achieves the information constraint with equality, I{p,,K*) = R, and 
can be realized as a stochastic linear system 

X' = (1 - e-^^)X + e-^\/l - e-2«F, (B.3) 

where V ^ N{0, cR') is independent of X. 
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